Sentence Level Information Patterns for Novelty Detection
نویسنده
چکیده
SENTENCE LEVEL INFORMATION PATTERNS FOR NOVELTY DETECTION JULY 2006 XIAOYAN LI, B.E. TSINGHUA UNIVERSITY M.E., TSINGHUA UNIVERSITY Ph.D. UNIVERSITY OF MASSACHUSETTS AT AMHERST Directed by: Professor W. Bruce Croft The detection of new information in a document stream is an important component of many potential applications. In this thesis, a new novelty detection approach based on the identification of sentence level information patterns is proposed. Given a user’s information need, some information patterns in sentences such as combinations of query words, sentence lengths, named entities and phrases, and other sentence patterns, may contain more important and relevant information than single words. The work of the thesis includes three parts. First, we redefine “what is novelty detection” in the lights of the proposed information patterns. Examples of several different types of information patterns are given corresponding to different types of uses’ information need. Second, we analyze why the proposed information pattern concept has a significant impact in novelty detection. A thorough analysis of sentence level information patterns is elaborated on data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, we present how we perform novelty detection based on information patterns, which focuses on the identification of previously unseen query-related patterns in sentences. A unified pattern-based approach is presented to novelty detection for both specific NE topics and more general topics. Experiments on novelty detection were carried out on data from the TREC 2002, 2003 and 2004 novelty tracks. Experimental results show that the proposed approach significantly improves the performance of novelty detection for both specific and general topics, therefore the overall performance for all topics, in terms of precision at top ranks. Future research directions are suggested.
منابع مشابه
An information-pattern-based approach to novelty detection
In this paper, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, ‘‘novelty’’ is redefined based on the proposed information patterns, and several different types of information patterns are given corresponding to different types of users’ information needs. Second, a thorough analysis of sentence level information patterns is...
متن کاملDocument-to-Sentence Level Technique for Novelty Detection
Novelty identification is accustomed to distinguishing novel data from an approaching stream of documents. In this study, we proposed a novel methodology for document-level novelty identification by utilizing document-to-sentence-level strategy. This work first splits a document into sentences, decides the novelty of every sentence, then registers the record-level novelty score in view of an al...
متن کاملGraph-Based Text Representation For Novelty Detection
We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can ...
متن کاملNovelty Detection via Answer Updating
The detection of new and novel information in a document stream is an important component of potential applications. This paper describes an answer updating approach to novelty detection at the sentence level. Specifically, we explore the use of questionanswering techniques for novelty detection. New information is defined as new/previously unseen answers to questions representing a user’s info...
متن کاملExploring fact-focused relevance and novelty detection
Purpose – Automated sentence-level relevance and novelty detection would be of direct benefit to many information retrieval systems. However, the low level of agreement between human judges performing the task is an issue of concern. In previous approaches, annotators were asked to identify sentences in a document set that are relevant to a given topic, and then to eliminate sentences that do n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006